Skip to content

Conversation

@DrakeLin
Copy link
Collaborator

@DrakeLin DrakeLin commented Jan 20, 2026

🥞 Stacked PR

Use this link to review incremental changes.


What changes are proposed in this pull request?

Integrates the stats transform module into checkpoint_data() to automatically populate stats and stats_parsed fields based on table configuration when writing checkpoints.

This PR affects the following public APIs

New APIs (non-breaking)

  1. CheckpointDataIterator trait - New public trait for checkpoint data iterators
pub trait CheckpointDataIterator: Iterator<Item = DeltaResult<FilteredEngineData>> {
       fn output_schema(&self) -> &SchemaRef;
       fn is_exhausted(&self) -> bool;
       fn actions_count(&self) -> i64; 
       fn add_actions_count(&self) -> i64;   
}
  1. TransformingCheckpointIterator struct - New public struct implementing CheckpointDataIterator

Breaking Changes

  1. CheckpointWriter::checkpoint_data() return type changed

    • Before: -> DeltaResult
    • After: -> DeltaResult
  2. CheckpointWriter::finalize() parameter type changed

    • Before: checkpoint_data: ActionReconciliationIterator
    • After: checkpoint_data: impl CheckpointDataIterator

Migration
Callers should update to use the new CheckpointDataIterator trait. The trait provides all methods previously available on ActionReconciliationIterator (is_exhausted(), actions_count(), add_actions_count()), plus the new output_schema() method for obtaining the write schema.

How was this change tested?

Integration tests

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Jan 20, 2026
@DrakeLin DrakeLin requested review from dengsh12 and nicklan January 21, 2026 00:26
@DrakeLin DrakeLin marked this pull request as ready for review January 21, 2026 00:26
@DrakeLin DrakeLin changed the title done feat: integrate stats transforms into checkpoint data generation Jan 21, 2026
@github-actions github-actions bot removed the breaking-change Change that require a major version bump label Jan 21, 2026
@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 93.35347% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.76%. Comparing base (d4ecc0a) to head (03a8e93).

Files with missing lines Patch % Lines
kernel/src/checkpoint/stats_transform.rs 94.77% 11 Missing and 2 partials ⚠️
kernel/src/checkpoint/mod.rs 89.02% 1 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1643      +/-   ##
==========================================
+ Coverage   84.65%   84.76%   +0.11%     
==========================================
  Files         123      124       +1     
  Lines       34109    34418     +309     
  Branches    34109    34418     +309     
==========================================
+ Hits        28875    29175     +300     
+ Misses       3905     3904       -1     
- Partials     1329     1339      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DrakeLin DrakeLin force-pushed the stack/write-stats branch 2 times, most recently from c0d47c1 to 9df9500 Compare January 21, 2026 23:38
@github-actions github-actions bot added the breaking-change Change that require a major version bump label Jan 21, 2026
DrakeLin added a commit that referenced this pull request Jan 23, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1645/files) to
review incremental changes.
-
[**stack/null-propagation**](#1645)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1645/files)]
-
[stack/coalesce](#1648)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1648/files/37e755009566511bf7c2f00e014c1647e77e4533..d64042f7908844ef2d8a1c68312dc3ff936d60dc)]
-
[stack/checkpoint-transforms](#1646)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1646/files/d64042f7908844ef2d8a1c68312dc3ff936d60dc..4e66ca004f89b23431a96ac106a9c0d400718b10)]
-
[stack/write-stats](#1643)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1643/files/4e66ca004f89b23431a96ac106a9c0d400718b10..cd64f79fd3b40ebfa811cb333369cb17aa1a2a74)]

---------
## What changes are proposed in this pull request?

Fixes a bug in nested transform expression evaluation where null rows in
the source struct were losing their null bitmap, causing null structs to
incorrectly appear as non-null structs with null fields.

When evaluating nested transform expressions (transforms with an
input_path that operate on a nested struct), the output StructArray was
created with None for the null buffer:
`let data = StructArray::try_new(output_fields.into(), output_cols,
None)?;`
This meant that if the source struct had null rows (e.g., an add action
that is null in a checkpoint batch), the output would lose that null
information. The struct would appear as non-null but with all-null
fields, which is semantically different.

## How was this change tested?
Existing transform tests pass. The stats transform integration tests (in
a follow-up PR) exercise this code path.
@DrakeLin DrakeLin force-pushed the stack/write-stats branch 3 times, most recently from ad7ef99 to e2ba315 Compare January 23, 2026 05:01
@github-actions github-actions bot removed the breaking-change Change that require a major version bump label Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant